05. Assessing Data

Assessing Data

Using Pandas, explore winequality-red.csv and winequality-white.csv in the Jupyter notebook below to answer quiz questions below the notebook about these characteristics of the datasets:

  • number of samples in each dataset
  • number of columns in each dataset
  • features with missing values
  • duplicate rows in the white wine dataset
  • number of unique values for quality in each dataset
  • mean density of the red wine dataset

This data was originally taken from here .

Workspace

This section contains either a workspace (it can be a Jupyter Notebook workspace or an online code editor work space, etc.) and it cannot be automatically downloaded to be generated here. Please access the classroom with your account and manually download the workspace to your local machine. Note that for some courses, Udacity upload the workspace files onto https://github.com/udacity , so you may be able to download them there.

Workspace Information:

  • Default file path:
  • Workspace type: jupyter
  • Opened files (when workspace is loaded): n/a

QUESTION:

How many samples of red wine are there?

SOLUTION:

NOTE: The solutions are expressed in RegEx pattern. Udacity uses these patterns to check the given answer

QUESTION:

How many samples of white wine are there?

SOLUTION:

NOTE: The solutions are expressed in RegEx pattern. Udacity uses these patterns to check the given answer

QUESTION:

How many columns are in each dataset?

SOLUTION:

NOTE: The solutions are expressed in RegEx pattern. Udacity uses these patterns to check the given answer

Which features have missing values?

SOLUTION:
  • None of these features have missing values

QUESTION:

How many duplicate rows are in the white wine dataset?

SOLUTION:

NOTE: The solutions are expressed in RegEx pattern. Udacity uses these patterns to check the given answer

Are duplicate rows in these datasets significant/ need to be dropped?

SOLUTION: Not necessarily

QUESTION:

How many unique values of quality are in the red wine dataset?

SOLUTION:

NOTE: The solutions are expressed in RegEx pattern. Udacity uses these patterns to check the given answer

QUESTION:

How many unique values of quality are in the white wine dataset?

SOLUTION:

NOTE: The solutions are expressed in RegEx pattern. Udacity uses these patterns to check the given answer

What is the mean density in the red wine dataset?

SOLUTION: 0.996747